Scalable Genre and Tag Prediction with Spectral Covariance

نویسندگان

  • James Bergstra
  • Michael I. Mandel
  • Douglas Eck
چکیده

Cepstral analysis is effective in separating source from filter in vocal and monophonic [pitched] recordings, but is it a good general-purpose framework for working with music audio? We evaluate covariance in spectral features as an alternative to means and variances in cepstral features (particularly MFCCs) as summaries of frame-level features. We find that spectral covariance is more effective than mean, variance, and covariance statistics of MFCCs for genre and social tag prediction. Support for our model comes from strong and state-of-the-art performance on the GTZAN genre dataset, MajorMiner, and MagnaTagatune. Our classification strategy based on linear classifiers is easy to implement, exhibits very little sensitivity to hyper-parameters, trains quickly (even for web-scale datasets), is fast to apply, and offers competitive performance in genre and tag prediction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mirex 2011: Audio Tag Classification Using Weighted-vote Nearest Neighbor Classification

In this long abstract, we present an algorithm for automatically annotating music with tags that is fast, scalable and relatively easy to implement. It uses acoustic similarity for propagating tags among audio items. The algorithm makes use of a variety of acoustical features, ranging from spectral features, to rhythm, tonal and highlevel features (such as mood, genre, gender). These features a...

متن کامل

Labrosa’s Audio Classification Submissions

We have submitted a system to MIREX 2008’s audio music classification tasks. It employs the spectral features described in [2] in addition to novel stereo-based features. For the n-way audio classification tasks (artist, classical composer, genre, latin genre, and mood identification) it uses a DAGSVM to perform classification. For the tag classification task, it uses a simple binary SVM with P...

متن کامل

Draft: a Refined Block-level Feature Set for Classification, Similarity and Tag Prediction

In our submission we use a set of so-called block-level features (BLF) for three different tasks, namely genre classification, tag classification and music similarity estimation. Compared to the submission in 2010 two additional feature were added to the feature set. This abstract gives an overview on the feature set and presents some specific details of the submitted algorithms.

متن کامل

Two-dimensional linear prediction and spectral estimation on a polar raster

A zero-mean homogeneous random field is defined on a discrete polar raster. Given sample values inside a disk of finite radius, we wish to estimate the field’s power spectral density using linear prediction. Issues arising here include estimation of covariance lags and extendibility of a finite set of lag estimates into a positive semidefinite covariance extension (required for a meaningful spe...

متن کامل

Music Genre Classification Systems [1ex] - A Computational Approach

In this paper music genre classification has been explored with special emphasis on the decision time horizon and ranking of tappeddelay-line short-time features. Late information fusion as e.g. majority voting is compared with techniques of early information fusion1 such as dynamic PCA (DPCA). The most frequently suggested features in the literature were employed including melfrequency cepstra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010